home *** CD-ROM | disk | FTP | other *** search
- README for the Linus extended file system defragmenter
-
- edefrag emergency release 0.3b alpha
-
- Copyright Stephen C. Tweedie, 1992, 1993 (sct@dcs.ed.ac.uk)
-
- Parts Copyright Remy Card, 1992 (card@masi.ibp.fr)
- Parts Copyright Linus Torvalds, 1992 (torvalds@kruuna.helsinki.fi)
-
- This file and the accompanying program may be redistributed under the
- terms of the GNU General Public License.
-
-
- INTRODUCTION: What does it do?
- ==============================
-
- As a file system is used, data tends to become more and more scattered
- over the disk, degrading performance. A disk defragmenter simply
- reorganises the data on the disk, so that individual files occupy a
- single sequential set of disk blocks, and all the free space on the
- disk is collected together in a single region. This generally means
- that reading a whole file is more efficient.
-
- The extended file system stores a list of unused disk blocks in a
- series of unused blocks scattered over the disk (the "free list").
- When blocks are required to store data, they are removed from the head
- of the list, and are added back when released (by unlinking or
- truncating a file).
-
- However, only the free blocks stored at the head of the list are
- available to the extfs at any time. This means that not all the free
- space is known to the extfs when it tries to find a free block; as a
- result, it does not always find the most efficient way to use free
- space.
-
- This is in contrast to the minix file system, in which free space is
- stored in a single bitmap, and the file system can allocate free space
- from anywhere on the disk.
-
- The resulting poorer performance over time of the extended file system
- is unfortunate, because the larger partitions and longer filenames it
- supports are useful to have around.
-
- So, here is the extended file system defragmenter - recover all that
- lost performance from your extfs partition.
-
- For an idea of the performance gains you might obtain - the first time
- I defragmented my file system, the time taken to boot my PC (from
- switching on until the XDM X windows login prompt stabilises) dropped
- from 37 seconds to 27 seconds.
-
- As for the performance of the defragmenter itself - well, that first
- version worked, but it thrashed my hard disk solid for over an hour
- (this was for a 90MB partition). The current version runs in not much
- over 5 minutes now, and most of the accesses are sequential (ie. NO
- thrashing). Granted that the fragmentation is not severe any longer,
- but that 5 or 6 minutes does still include reading and writing over
- 70MB of the partition.
-
- Note - as of release 0.3, minix file systems are also supported.
-
- HOW TO USE: and a few warnings.
- ===============================
-
- Number one - (this applies to all - repeat, ALL - major file system
- operations).
-
- *** BACK UP ANY IMPORTANT DATA BEFORE YOU START. ***
-
- There may be bugs in the defragmenter. You may have undetected errors
- on your disk which are undiscovered until edefrag tries to write to a
- bad block which has never been accessed before. There may be power
- glitches, memory glitches, kernel errors. [e]defrag does some major
- reorganisation of disk data, and if for any reason it doesn't finish
- its work, most of your file system is likely to be trashed.
-
- *** YOU HAVE BEEN WARNED. ***
-
- *** NEVER try to defragment an active or mounted file system.
-
- It is often safe to use [e]fsck on a mounted fs; don't be conned into
- thinking that the same will work for [e]defrag. The file system will
- be totally unusable while [e]defrag is working; and if this causes a
- kernel crash, or if the fs interferes with the defragmenter as it
- runs, you may well loose your entire partition.
-
- This means that in order to defragment a root partition, you will
- probably need to run [e]defrag from a boot floppy.
-
- However, it IS totally safe to run [e]defrag in its readonly mode (for
- testing) on an active partition.
-
- *** Run [e]fsck on the partition first, to check its integrity.
-
- Although I have been quite careful about the defragmenter's behaviour
- on a corrupt file system (it should back down gracefully before doing
- anything irreversible), it may well cause a lot of damage if the file
- system is invalid in any way.
-
- In particular, there is currently no handling of read/write errors in
- the defragmenter. The extfs version DOES understand the bad block
- inode (and the special handling now works - as of version 0.3b), so if
- you suspect you might have bad blocks, try running efsck -t (test for
- bad blocks) before defragmenting.
-
- However, if you have an IDE drive, you needn't worry; you should never
- get any hd errors, as IDE drives dynamically remap bad blocks
- internally, as they occur. Until I have proper bad block support for
- minix, it's probably unwise to try to defragment a suspect, non-IDE
- minix partition.
-
- *** Run [e]defrag -r next, just to be sure.
-
- If there are any bugs in the defragmenter, running in readonly mode
- first may find them ([e]defrag does quite a lot of self-checking as it
- goes) before you lose any data.
-
- *** Reinstall lilo after defragmenting a bootable partition.
-
- Defragmentation moves data around the disk. edefrag knows all of the
- file system's internal pointers to this data, so these are adjusted as
- needed to keep the file system intact. Lilo, unfortunately, keeps its
- own pointers to the location of kernel image files, so that the kernel
- can be loaded before the filing system is running. (These pointers
- are usually kept in /etc/lilo/map.) If you defragment a partition
- containing a lilo-bootable kernel image, you MUST reinstall lilo to
- rebuild the now-invalid map file.
-
-
- Usage: edefrag [-Vdrsv] [-p pool_size] /dev/name
-
- -V : Prints the full CVS version id for the release. Send me
- this information with any problem reports or suggestions.
- -s : Show superblock information.
- -v : Verbose. Shows what the program is doing. If used
- twice, gives extra progress information.
- -r : Readonly. This opens the file system in readonly mode,
- which guarantees that your data will not be harmed. This
- can be useful for testing purposes, especially for
- working out the best buffer pool size to use.
- -d : (If enabled at compile-time) Debug mode.
-
- The pool_size is the number of 1KB (disk block) buffers to
- allocate to the buffer pool while relocating the file system
- data. (Default is 512; it cannot be set below 20.)
-
- Finally, /dev/name should be the device to be defragmented; an
- image file may also be used (for debugging purposes), as
- edefrag does not check that the file is a block device.
-
-
- HINTS
- =====
-
- You may want to experiment with edefrag to find the best memory usage
- before defragmenting. Currently, the significant tables held in
- memory by edefrag are:
-
- Relocation maps - eight bytes per block.
- Inode table - 64 bytes per inode.
- Inode maps - 8 bytes per inode.
-
- The buffer pool must be added on top of this.
-
- For a typical file system, this works out at around 26K of memory
- required per MB of disk space, or 2.6MB memory for a 100MB disk
- partition; plus the buffer pool.
-
- It is safe to use a swap file or partition if memory is tight (but NOT
- one on the file system being defragmented!); this may not even affect
- performance much, since during its first (mapping) phase, the
- defragmenter accesses the inode table but not the buffer pool; during
- the second (relocating) phase, the inode table is unused and the
- buffer pool comes into play.
-
- (Don't worry about the defragmenter suddenly running out of memory
- during its work; all the memory required is allocated and initialised
- before it starts operation, so any memory errors should occur before
- the file system gets touched.)
-
- The defragmenter tries as hard as possible to group reads and writes
- into long sequential accesses. Data being overwritten on the disk
- gets put into a rescue buffer, and may soon just get written back
- during the normal course of sequential writes. However, if the buffer
- pool is too small or the disk is highly fragmented, edefrag tries to
- clear out the rescued data by seeing if its final destination is empty
- yet. (These are termed "migrate" writes; the data migrates from the
- rescue pool to the output pool.) If that fails to free enough space,
- edefrag forces some of the rescue buffers out into empty blocks
- ("forcing" writes), from which the data will have to be re-read at
- some point.
-
- The upshot of this is that normal buffer writes are highly sequential
- and efficient; "migrate" writes are slightly less sequential, but
- still quite efficient; and "forcing" writes cause data to be read
- twice, and from this point of view are quite inefficient.
-
- Running edefrag with the -r option will scan your file system
- non-destructively, and will report on the work it would have to do to
- defragment the disk. This facility can be used to adjust the pool
- size requested to compromise between memory used and defragmenting
- efficiency.
-
- For example, I have just run:
- $ edefrag -r /dev/hda3 [ default 512K buffer pool ]
-
- [ ... superblock statistics deleted ... ]
- Relocation statistics:
- 44807 buffer reads in 91 groups, of which:
- 14004 read-aheads.
- 44807 buffer writes in 91 groups, of which:
- 0 migrations, 0 forces.
-
- $ edefrag -r -p 100 /dev/hda3
-
- [ ... superblock statistics deleted ... ]
- 45299 buffer reads in 618 groups, of which:
- 13310 read-aheads.
- 45299 buffer writes in 618 groups, of which:
- 202 migrations, 492 forces.
-
- The first result indicates a higher efficiency with 512 buffers
- than with 100. However, even the second run would have been quite
- quick; 492 forces out of a 90MB file system is not bad. (By the way,
- the reason the total number of writes is less than 90MB is that much
- of my hard disk was fully defragmented anyway. 8-)
-
- If, however, my disk had been badly fragmented (as it used to be...) I
- would probably have had to allocate around 2000-4000 buffers to get
- good efficiency with few forced writes.
-
- The tradeoff is that the less memory you allocate for pool buffers,
- the more is available for the kernel to cache reads itself. Since the
- kernel reads entire tracks at a time, leaving space to the kernel
- effectively gives extra "free" buffer reads.
-
- I'm not yet quite sure whether it is more efficient to leave the
- kernel with a healthily large cache for itself, or to allocate as much
- for edefrag's own (more optimised for the task) buffering scheme. You
- may want to experiment here, and I would be interested in hearing any
- conclusions you reach. I am running with 16MB ram, so if you have
- less ram your mileage may vary.
-
-
- WARRANTY:
- =========
-
- NONE. Use at your own risk. BACK UP ANY IMPORTANT DATA BEFORE YOU
- START.
-
- I have successfully run edefrag on my own root, 90MB extfs partition
- at home. It has been tested on particularly hard jobs, such as
- defragmenting a 1.44MB floppy with a buffer pool restricted to 20KB -
- lots of extra writes are necessary to cope with a tiny buffer pool.
- This release has never crashed for me, and has never lost me any data.
- I am confident enough to use it fairly regularly, and if I back up
- data before using it, I only backup stuff which cannot be reinstalled
- from other sources. I have tried as far as possible to ensure that
- edefrag will not harm your data. However, I cannot make ANY guarantee
- that it won't. Use it and enjoy it, but don't blame me if it ruins
- your day.
-
- Having said that, if you DO have problems, let me know and I'll try to
- fix them for the next release. (Even better, send me bug fixes!)
-
-
- TO DO:
- ======
-
- There is currently NO minix file system support. Watch this space.
- When the mark 2 extfs is released by Remy Card, I should support
- that, too.
-
- I currently read in the entire inode table before starting, and write
- it out again at the end. This is really a throw-back to edefrag's
- origins in efsck. Since I no longer access the inodes at all after
- initially calculating the disk relocation maps, I could probably get
- away with just accessing inode data as needed, so using less memory.
- Otherwise, try sharing memory between the inode table and the buffer
- pool, since the two are never used at the same time.
-
- The verbose (-v) option could do with a little rationalisation, and an
- interactive (maybe full screen?) mode showing progress would be nice.
-
- The sync() frequency should probably be configurable at run-time.
-
- ===
- Stephen Tweedie (sct@dcs.ed.ac.uk).
-